73 research outputs found

    Bioinformatics for personal genome interpretation

    Get PDF
    none4An international consortium released the first draft sequence of the human genome 10 years ago. Although the analysis of this data has suggested the genetic underpinnings of many diseases, we have not yet been able to fully quantify the relationship between genotype and phenotype. Thus, a major current effort of the scientific community focuses on evaluating individual predispositions to specific phenotypic traits given their genetic backgrounds. Many resources aim to identify and annotate the specific genes responsible for the observed phenotypes. Some of these use intra-species genetic variability as a means for better understanding this relationship. In addition, several online resources are now dedicated to collecting single nucleotide variants and other types of variants, and annotating their functional effects and associations with phenotypic traits. This information has enabled researchers to develop bioinformatics tools to analyze the rapidly increasing amount of newly extracted variation data and to predict the effect of uncharacterized variants. In this work, we review the most important developments in the field-the databases and bioinformatics tools that will be of utmost importance in our concerted effort to interpret the human variome. Ā© The Author 2012. Published by Oxford University Press.openCapriotti, Emidio; Nehrt, Nathan L.; Kann, Maricel G.; Bromberg, YanaCapriotti, Emidio; Nehrt, Nathan L.; Kann, Maricel G.; Bromberg, Yan

    Workshop during the Pacific Symposium of Biocomputing, Jan 3-7, 2019: Reading between the genes: interpreting non-coding DNA in high-throughput

    Get PDF
    Identifying functional elements and predicting mechanistic insight from non-coding DNA and non-coding variation remains a challenge. Advances in genome-scale, high-throughput technology, however, have brought these answers closer within reach than ever, though there is still a need for new computational approaches to analysis and integration. This workshop aims to explore these resources and new computational methods applied to regulatory elements, chromatin interactions, non-protein-coding genes, and other non-coding DNA.Open access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]

    Threshold Average Precision (TAP-k): a measure of retrieval designed for bioinformatics

    Get PDF
    Motivation: Since database retrieval is a fundamental operation, the measurement of retrieval efficacy is critical to progress in bioinformatics. This article points out some issues with current methods of measuring retrieval efficacy and suggests some improvements. In particular, many studies have used the pooled receiver operating characteristic for n irrelevant records (ROCn) score, the area under the ROC curve (AUC) of a ā€˜pooledā€™ ROC curve, truncated at n irrelevant records. Unfortunately, the pooled ROCn score does not faithfully reflect actual usage of retrieval algorithms. Additionally, a pooled ROCn score can be very sensitive to retrieval results from as little as a single query

    Protein sequence alignment: Theory, algorithms, and optimal score function.

    Full text link
    The growth in protein sequence data has placed a premium on ways to infer structure and function of the newly sequenced proteins. One of the most effective ways is to identify a homologous relationship with a protein about which more is known. This methodology, also known as sequence comparison, is addressed in this thesis. Using a novel optimization-iteration procedure, we obtain a new score function to improve the detection of distant homologs with sequence comparison. In addition, we assess the performance of a recently developed alignment algorithm for sequence comparison and examine the null statistics of the scores obtained with this algorithm. The analysis begins by introducing the current techniques for sequence alignment and the interpretation of the scores obtained with such procedures. All of these methods rely on some score function to measure sequence similarity. We describe a new method of determining a score function, optimizing the ability to discriminate between homologs and non-homologs. We find that this new score function (OPTIMA) performs better than standard score functions for the identification of distant homologies. A detailed analysis of the performance of hybrid, a new sequence alignment algorithm developed by Yu and co-workers that combines Smith Waterman local dynamic programming with a local version of the maximum-likelihood approach, was made in order to access the applicability of this algorithm to the detection of distant homologs by sequence comparison. We analyzed the statistics of hybrid with a set of non-homologous protein sequences from the SCOP database and found that the statistics of the scores from hybrid algorithm follows an Extreme Value Distribution with lambda ∼1, as previously demonstrated by Yu et al. for the case of artificially generated sequences. The ability of dynamic programming to discriminate between homologs and non-homologs in the two sets of distantly related sequences is slightly better than that of hybrid algorithm. The advantage of producing accurate score statistics with only a few simulations may overcome the small differences in performance and make this new algorithm suitable for detection of homologs in conjunction with a wide range of score functions and gap penalties.Ph.D.BiochemistryBiological SciencesMolecular biologyPure SciencesUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/127833/2/3029357.pd
    • ā€¦
    corecore